Diversity patterns of the Flora of Greece with applications in R
You can find all files in github.
1 Before we start
Before you start downloading data from the internet, you need to create a project in R-Studio. The name of the project, as well as its file path should NOT include the following:
1. Greek letters
2. spaces (replace the spaces with _)
For example your file path could be: C:/Environmental_Data_Analysis/Geographical_Data.
You also need to install and load all the packages that we are going to use today1.
2 Geographical data
2.1 Introduction
In general, geographical data can be divided in two main categories:
1. Vector data
2. Raster data
2.1.1 Vector data
Vector data can be further subdivided into three categories:
1. Points
2. Lines
3. Polygons
These data have discrete, well-defined borders and usually have a high level of precision, but are not necessarily accurate.
Points located within a coordinate reference system (CRS) constitude the basis of vector data. Points can either be self-standing features (e.g., Ioannina city) or they can be linked together to form more complex geometries, such as lines (e.g., the national road connecting Ioannina to Thessaloniki) and polygons (e.g., the area covered by the city of Ioannina)2.
The main packages we are going to use when dealing with vector data are the following:
1. sf
2. sp
3. rgeos
4. rgdal
You can see the sf vignettes here and the sp vignettes here.
A great resource for spatial data analysis is this book by Bivand et al. (2008), as well as this one3.
2.1.2 Raster data
On the other hand, raster data divide the surface up into cells of constant size (e.g., 1 x 1 km2 cells). Raster datasets are the basis of background images used in web-mapping and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable (Lovelace et al., 2019).
The geographic raster data model usually consists of a raster header and a matrix (with rows and columns) representing equally spaced cells (i.e., pixels). The raster header defines the coordinate reference system, the extent and the origin. The header defines the extent via the number of columns, the number of rows and the cell size resolution. Hence, starting from the origin, we can easily access and modify each single cell by either using the ID of a cell or by explicitly specifying the rows and columns. In contrast to vector data, the cell of one raster layer can only hold a single value. The value might be numeric or categorical.
Raster maps usually represent continuous phenomena, such as elevation, temperature, population density or spectral data. Discrete features such as soil or land-cover classes can also be represented as raster data. Consequently, the discrete borders of these features become blurred, and depending on the spatial task a vector representation might be more suitable.
In general, there are three raster classes:
1. RasterLayer
2. RasterBrick
3. RasterStack
The RasterLayer class represents the simplest form of a raster object, and consists of only one layer.
Both RasterBrick and RasterStack classes can handle multiple layers, but differ regarding the number of supported file formats, type of internal representation and processing speed.
A RasterBrick consists of multiple layers, which typically correspond to a single multispectral satellite file or a single multilayer object in memory.
A RasterStack is similar to a RasterBrick in the sense that it consists also of multiple layers. However, in contrast to RasterBrick, RasterStack allows you to connect several raster objects stored in different files or multiple objects in memory. More specifically, a RasterStack is a list of RasterLayer objects with the same extent and resolution (Lovelace et al., 2019).
3 Download vector data
3.1 Load packages
We will have to load some libraries first:
## ===========================================================================##
## Load them
## ===========================================================================##
library(tidyverse)
library(rgbif)
library(raster)
library(usdm)
library(rgeos)
library(rgdal)
library(rinat)
library(wdpar)
library(countrycode)
library(devtools)
library(sf)
library(dismo)
library(CoordinateCleaner)
library(scales)
library(scrubr)
library(mapr)
library(rasterVis)
library(data.table)
## ===========================================================================##3.2 Polygon data
3.2.1 Country boundaries
Then we can download some vector data in the form of polygons. We can use the online database GADM to download country boundaries.
## ===========================================================================##
## Download the data for Greece
## ===========================================================================##
## There are three levels available, you are free to explore them
Greece <- raster::getData("GADM", country = "GRC", level = 3)
## ===========================================================================##We can easily plot the data we have just downloaded.
## ===========================================================================##
## Plot Greece
## ===========================================================================##
plot(Greece)It is equally easy to subset those data.
## ===========================================================================##
## First we need to see what is inside the object we created
## ===========================================================================##
Greece## class : SpatialPolygonsDataFrame
## features : 326
## extent : 19.37236, 29.6457, 34.80069, 41.74801 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 16
## names : GID_0, NAME_0, GID_1, NAME_1, NL_NAME_1, GID_2, NAME_2, NL_NAME_2, GID_3, NAME_3, VARNAME_3, NL_NAME_3, TYPE_3, ENGTYPE_3, CC_3, ...
## min values : GRC, Greece, GRC.1_1, Aegean, <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1_1, Athos, <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1.1_1, Abdera, Abelokipi-Menemeni, <U+0386><U+03B8><U+03C9><U+03C2>, Dímos, Municipality, NA, ...
## max values : GRC, Greece, GRC.8_1, Thessaly and Central Greece, Tessa<U+03BB><U+03AF>a <U+03BA>a<U+03B9> Ste<U+03C1>e<U+03AC> <U+0395><U+03BB><U+03BB><U+03AC>da, GRC.8.2_1, West Macedonia, Tessa<U+03BB><U+03AF>a, GRC.8.2.9_1, Zografou, Zografos, Tessa<U+03BB><U+03BF><U+03BD><U+03AF><U+03BA><U+03B7><U+03C2>, Dímos, Municipality, NA, ...
We see that our object has a Coordinate Reference System (crs) and that it includes several columns (as indicated by the slot names).
Next, we are going to see how many columns our object includes:
## [1] "GID_0" "NAME_0" "GID_1" "NAME_1" "NL_NAME_1" "GID_2"
## [7] "NAME_2" "NL_NAME_2" "GID_3" "NAME_3" "VARNAME_3" "NL_NAME_3"
## [13] "TYPE_3" "ENGTYPE_3" "CC_3" "HASC_3"
Finally, we are going to create a new spatial object by subsetting Greece. We do so, by using one of the columns present in Greece.
Let’s plot Crete now
## ===========================================================================##
## Plot Crete
## ===========================================================================##
plot(Crete)Let’s load some data for Italy now, this time using the sf R package.
Please note that you will have to change the file path.
3.2.2 Protected areas
We can also download data from the World Database on Protected areas, the most comprehensive global dataset of protected areas. It is used to monitor the performance of existing protected areas and pinpoint priority areas for establishing new protected areas. We will use the wdpar package to download data for Greece.
## ===========================================================================##
## Download protected area data for Greece (it might take a while)
## ===========================================================================##
gr_raw_pa_data <- wdpa_fetch("Greece", wait = TRUE)
## ===========================================================================##
## ===========================================================================##
## Let's see what we have downloaded
## ===========================================================================##
## head(gr_raw_pa_data)If you need more information on what these columns mean, please refer to the WPDA manual.
We need to change the CRS for plotting. Currently the CRS of gr_raw_pa_data is set to Greek Grid (else known as EGSA 87). We need to change it to WGS 84 (also known as World Geodetic System). We will use the function st_transform from the sf package. The number 4326 refers to WGS 84 in the European Data Petroleum System or EPSG.
## ===========================================================================##
## Reproject data to longitude/latitude for plotting
## ===========================================================================##
gr_pa_data <- st_transform(gr_raw_pa_data, 4326)As a next step, we will clip the terrestrial protected areas to the Greek coastline. First, in order to be on the safe side, we will perform some sanity checks on Greek boundaries and then we will do the clipping.
##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>%
st_set_precision(1000) %>%
st_make_valid() %>%
st_set_precision(1000) %>%
st_combine() %>%
st_union() %>%
st_set_precision(1000) %>%
st_make_valid() %>%
st_transform(st_crs(gr_pa_data)) %>%
st_make_valid()
##===========================================================================##
##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
filter(MARINE == "terrestrial") %>%
st_intersection(gr_boundary_data) %>%
rbind(gr_pa_data %>%
filter(MARINE == "marine") %>%
st_difference(gr_boundary_data)) %>%
rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
"marine")))
##===========================================================================##
Now, we can recalculate the extent of each protected area.
## ===========================================================================##
## Recalculate extent
## ===========================================================================##
gr_pa_data <- gr_pa_data %>% mutate(AREA_KM2 = as.numeric(st_area(.)) * 1e-06) ## What is this number?
## ===========================================================================##As a next step, we can also calculate the percentage of land inside its protected area system that are managed under different categories, as defined by IUCN).
##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
as.data.frame() %>%
dplyr::select(-geometry) %>%
group_by(IUCN_CAT) %>%
summarize(area_km = sum(AREA_KM2)) %>%
ungroup() %>%
mutate(percentage = (area_km / sum(area_km)) * 100) %>%
arrange(desc(area_km))
##===========================================================================##
##===========================================================================##
## Print it
##===========================================================================##
statistic
Now we can keep ony those PAs that have been assigned to each of the IUCN categories (I-VI) and discard all other PAs.
## ===========================================================================##
## Keep certain PAs only
## ===========================================================================##
gr_pa_data_filtered <- gr_pa_data %>% filter(!str_detect(IUCN_CAT, "Not"))
## ===========================================================================##Let’s visualise the results.
## ===========================================================================##
## First convert Greece and Crete to sf objects
## ===========================================================================##
Greece_sf <- st_as_sf(Greece)
Crete_sf <- st_as_sf(Crete)
## ===========================================================================##
## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Greece_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Greece_sf,
fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = gr_pa_data_filtered, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")Now we can restrict the analysis to Crete. We will clip the PA data to the extent of Crete. In order to do so, we will use the st_intersection function from the sf package and then we will visualise the results.
## ===========================================================================##
## Restrict the analysis to Crete
## ===========================================================================##
pa_crete <- st_intersection(gr_pa_data_filtered, Crete_sf)
## ===========================================================================##
## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Crete_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Crete_sf,
fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = pa_crete, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")Finally, if we want to see all the PAs in Crete, we will do the following:
## ===========================================================================##
## Restrict the analysis to Crete
## ===========================================================================##
pa_crete_all <- st_intersection(gr_pa_data, Crete_sf)
## ===========================================================================##
## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Crete_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Crete_sf,
fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = pa_crete_all, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")3.3 Point data
As we have mentioned earlier, the most basic category of vector data are points. In ecology, points most often represent sampling locations or field observation in general. These observations consist primarily species presence in one site. Nowadays, it is fairly easy to obtain species occurrence data through freely available, online databases, such as GBIF, BIEN and iNaturalist. Bear in mind these databases refer to current/recent species occurrences, yet there are others, such as the Paleobiology and Neotoma databases, that contain information regarding fossil data.
First, let’s load the data we have cleaned for Italy.
In order for this to run smoothly, load the rds data from Github.
3.4 Subset point data based on polygons
We can now see if there are any points occurring in a specific polygon (e.g., in any given Italian province). We will use spatial subsetting for this. Spatial subsetting is the process of selecting features of a spatial object based on whether or not they in some way relate in space to another object.
But before we do this spatial subsetting, we must load an excel file that contains only the valid binomials and not taxa above the species level and below the subspecies level (e.g., families, genera, varieties)4.
## ============================================================================##
## First load the excel file with the correct species names
## ============================================================================##
nms <- readxl::read_excel("italian_names.xlsx")
## ============================================================================##
## ============================================================================##
## Keep only one instance per taxon from our occurrence data
## ============================================================================##
nms_flags <- data_it %>% distinct(scientificName) %>% mutate(Taxon = nms$scientificName,
action = nms$action)
## ============================================================================##
## ============================================================================##
## Then filter out any incorrect taxa
## ============================================================================##
data_it <- data_it %>% full_join(nms_flags) %>% as_tibble() %>% filter(!str_detect(action,
"DEL"))
## ============================================================================##
## ============================================================================##
## Then convert the occurrence data from GBIF to a spatial object
## ============================================================================##
data_it_sf <- st_as_sf(data_it, coords = c("decimalLongitude", "decimalLatitude"),
crs = 4326)
## ============================================================================##
## ============================================================================##
## Then create an object representing Pescara and another one representing Padova
## ============================================================================##
Pescara <- Italy %>% filter(NOME_PRO == "PESCARA")
Padova <- Italy %>% filter(NOME_PRO == "PADOVA")
## ============================================================================##
## ============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
## ============================================================================##
occs_Pescara <- data_it_sf[Pescara, ]
occs_Padova <- data_it_sf[Padova, ]
## ============================================================================##
## ============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
## ============================================================================##
occs_not_in_Padova <- data_it_sf[Padova, , op = st_disjoint]
## ============================================================================##Note the empty argument denoted with , , in the preceding code chunk is included to highlight op, the third argument in [ for sf objects. By using st_disjoint, we selected all the occurrences that are not present in the province of Padova.
Let’s see what we have created.
## ============================================================================##
## First we plot Italy
## ============================================================================##
plot(st_geometry(Italy))
## ============================================================================##
## ============================================================================##
## Then we plot Padova, after we convert the polygon into a polyline with st_cast
## ============================================================================##
plot(st_geometry(st_cast(Padova, "MULTILINESTRING")), add = T, col = "red")
## ============================================================================##
## ============================================================================##
## Then we plot the occurrences present in Padova
## ============================================================================##
plot(st_geometry(occs_Padova %>% dplyr::sample_n(100)), add = T, col = "blue")
## ============================================================================##
## ============================================================================##
## And finally we plot the occurrences not present in Padova
## ============================================================================##
plot(st_geometry(occs_not_in_Padova %>% dplyr::sample_n(1000)), add = T, col = "darkgreen")We can also calculate how many occurrences and how many species each Italian province contains.
##============================================================================##
## First we will create a new spatial object containing both the point and the
## polygon data
##============================================================================##
aggr_data <- Italy %>%
st_join(data_it_sf) %>%
dplyr::group_by(NOME_PRO) %>% ## Here we group the data based on the provincial names
## First we create a variable containing the sum of the species in each province
dplyr::summarize(Species_Number = sum(NROW(unique(Taxon))),
## Then we just sum the number of occurrences in each province
Occurrence_Number = n())
##============================================================================##
##============================================================================##
## Save it to disk
##============================================================================##
saveRDS(aggr_data, 'Italian species data.rds')## ============================================================================##
## Let's first plot the occurence number in each Italian province
## ============================================================================##
plot(aggr_data["Occurrence_Number"])## ============================================================================##
## And then plot the species number in each Italian province
## ============================================================================##
plot(aggr_data["Species_Number"])4 Download raster data
Nowadays, there are several freely available raster databases for climate (e.g., WorldClim, CHELSA, Oscillayers, PaleoClim, CliMond, MerraClim, TerraClimate, EuMedClim, Envirem), soil (SoilGrids), topographical (EarthEnv, CGIAR, MERIT), geological (e.g., GLIM) and LULC (Land Cover/Land Use - Luh) data. Of course there are many other databases dealing with NVDI data, habitat heterogeneity, human population density etc.
Today we will deal with climate data from the WorldClim database, as these data can be accessed rather easily through the R interface. WorldClim provides data for 19 environmental variables that have to with temperature and precipitation:
WorldClim environmental variables
| Var. | Description |
|---|---|
| BIO1 | Annual Mean Temperature |
| BIO2 | Mean Diurnal Range |
| BIO3 | Isothermality (BIO2/BIO7) (* 100) |
| BIO4 | Temperature Seasonality (standard deviation *100) |
| BIO5 | Max Temperature of Warmest Month |
| BIO6 | Min Temperature of Coldest Month |
| BIO7 | Temperature Annual Range (BIO5-BIO6) |
| BIO8 | Mean Temperature of Wettest Quarter |
| BIO9 | Mean Temperature of Driest Quarter |
| BIO10 | Mean Temperature of Warmest Quarter |
| BIO11 | Mean Temperature of Coldest Quarter |
| BIO12 | Annual Precipitation |
| BIO13 | Precipitation of Wettest Month |
| BIO14 | Precipitation of Driest Month |
| BIO15 | Precipitation Seasonality (Coefficient of Variation) |
| BIO16 | Precipitation of Wettest Quarter |
| BIO17 | Precipitation of Driest Quarter |
| BIO18 | Precipitation of Warmest Quarter |
| BIO19 | Precipitation of Coldest Quarter |
We can download future and past climate data from WorldClim as well, but we need to specify the Global Circulation Model, the Representative Pathway Scenario and the time period (2050 or 2070) we are interested in. Depending on our area of interest, we should choose carefully which GCMs are better suited for our intended analysis (McSweeney et al., 2015).
Please bear in mind that the following commands may take a while, since we will be downloading ca. 124 MB.
Maybe it would be a good idea NOT to run the following commands now, if your internet connection is not very fast (it may take more than 10 mins) - You can change the
res = 2.5tores = 10if you want to speed it up, but bear in mind that the resolution will be very coarse.
## ============================================================================##
## First, let's download current climate data for the whole planet
## ============================================================================##
current_climate <- raster::getData("worldclim", var = "bio", res = 2.5)
## ============================================================================##
## ============================================================================##
## Then, let's download future climate data
## ============================================================================##
future_climate <- raster::getData("CMIP5", var = "bio", res = 2.5, rcp = 85, model = "CC",
year = 70)
## ============================================================================##Let’s now change the name of the variables in each object we created.
names(current_climate) <- c("bio_1", "bio_2", "bio_3", "bio_4", "bio_5", "bio_6",
"bio_7", "bio_8", "bio_9", "bio_10", "bio_11", "bio_12", "bio_13", "bio_14",
"bio_15", "bio_16", "bio_17", "bio_18", "bio_19")
names(future_climate) <- c("bio_1", "bio_2", "bio_3", "bio_4", "bio_5", "bio_6",
"bio_7", "bio_8", "bio_9", "bio_10", "bio_11", "bio_12", "bio_13", "bio_14",
"bio_15", "bio_16", "bio_17", "bio_18", "bio_19")We need to correct the scale of bio_1 (i.e., Mean annual temperature). You can see here the reason why.
Next we will download elevation data for Italy.
4.1 Crop raster data
We can crop the raster data we have just downloaded to the extent of Italy and then to the extent of some of its provinces. After we crop climate data to the extent of Italy, we can merge them with the altitudinal data. We will be able to do so, because then and only then will both objects (climate and altitude) have the same extent.
## ============================================================================##
## First for Italy
## ============================================================================##
current_climate_crop <- crop(current_climate, Italy, snap = "in") %>% mask(., Italy)
altitude <- crop(altitude, Italy, snap = "in") %>% mask(., Italy) %>%
## We are forcing the extent of altitude to be exactly the same as that of the
## climate data
resample(., current_climate_crop, "bilinear")
## ============================================================================##
## ============================================================================##
## Merge (stack) climate and altitude
## ============================================================================##
current_climate_crop <- stack(current_climate_crop, altitude)
## ============================================================================##
## ============================================================================##
## Then for Padova
## ============================================================================##
current_climate_Padova <- crop(current_climate_crop, Padova, snap = "in") %>% mask(.,
Padova)
## ============================================================================##You can do the same for the future climate data if you want.
Let’s now plot our results.
4.2 Change the resolution
If for some reason, we want to change the resolution of our rasters, we can use the aggregate function5:
4.3 Calculate some statistics
The package raster contains functions for extracting descriptive statistics for entire rasters. Printing a raster object to the console by typing its name returns minimum and maximum values of a raster. The function summary() provides common descriptive statistics (minimum, maximum, quartiles and number of NAs). Further summary operations such as the standard deviation or custom summary statistics can be calculated with the function cellStats().
Raster value statistics can be visualized in a variety of ways. Specific functions such as boxplot(), density(), hist() and pairs() work also with raster objects
## ============================================================================##
## First, some main summary statistics
## ============================================================================##
cellStats(current_climate_crop, "summary")## bio1 bio10 bio11 bio12 bio13 bio14
## Min. -5.50000 9.0000 -112.00000 290.0000 44.0000 0.00000
## 1st Qu. 11.10000 193.0000 25.00000 676.0000 88.0000 20.00000
## Median 13.00000 215.0000 45.00000 813.5000 102.0000 36.00000
## bio15 bio16 bio17 bio18 bio19 bio2
## Min. 7.00000 117.0000 6.0000 22.0000 83.0000 42.0000
## 1st Qu. 22.00000 237.0000 80.0000 90.0000 168.0000 72.0000
## Median 28.00000 275.0000 131.0000 151.0000 196.0000 82.0000
## bio3 bio4 bio5 bio6 bio7 bio8
## Min. 21.00000 4524.00 42.0000 -145.000000 171.0000 -104.0000
## 1st Qu. 29.00000 5698.00 253.0000 -17.000000 235.0000 99.0000
## Median 31.00000 6110.00 278.0000 8.000000 254.0000 121.0000
## bio9 ITA_msk_alt
## Min. -103.0000 -6.3125
## 1st Qu. 46.0000 110.6849
## Median 201.0000 337.8559
## [ reached getOption("max.print") -- omitted 4 rows ]
## ============================================================================##
## Then some density plots for some variables
## ============================================================================##
density(current_climate_crop[[1:2]])## ============================================================================##
## And some boxplots for the same variables
## ============================================================================##
boxplot(current_climate_crop[[1:2]])## ============================================================================##
## We can also calculate the correlation of those variables
## ============================================================================##
pairs(current_climate_crop[[1:2]])You can do the same for other variables as well and you can use the function
plot3D()to visualize the altitudinal data.
4.4 Correlation among raster data
As you may know, if we want to analyse some ecological data, we should not include in our analysis collinear independent variables (IVs). The correlation coefficient of our IVs should not exceed \(> 0.7\). We will now check the collinearity of our IVs and see which of them do not present any collinearity issues.
## 11 variables from the 20 input variables have collinearity problem:
##
## bio6 bio10 bio17 bio16 bio11 bio5 bio7 bio1 bio18 bio12 bio14
##
## After excluding the collinear variables, the linear correlation coefficients ranges between:
## min correlation ( bio9 ~ bio2 ): -0.04715004
## max correlation ( bio4 ~ bio15 ): -0.6800702
##
## ---------- VIFs of the remained variables --------
## Variables VIF
## 1 bio13 4.048064
## 2 bio15 3.052235
## 3 bio19 4.497012
## 4 bio2 42.962093
## 5 bio3 41.237207
## 6 bio4 32.253340
## 7 bio8 4.109454
## 8 bio9 5.322188
## 9 ITA_msk_alt 6.976100
4.5 Raster extraction
Raster extraction is the process of identifying and returning the values associated with a target raster at specific locations, based on a (typically vector) geographic selector object. The results depend on the type of selector used (points, lines or polygons) and arguments passed to the raster::extract() function.
The simplest example is extracting the value of a raster cell at specific points. For this purpose, we will use the point data we subsetted earlier, occs_Padova and occs_Pescara. We will then compare if the species that are both present in these two spatial subsets have significantly different environmental preferences.
Please note that the exact_extract function from the exactextractr R package is way faster than the extract function from the raster R package, if we want to extract data based on polygons.
## ============================================================================##
## Then we convert the sp objects to sf objects
## ============================================================================##
occs_Padova_sf <- occs_Padova %>% st_as_sf()
occs_Pescara_sf <- occs_Pescara %>% st_as_sf()
## ============================================================================##
## ============================================================================##
## Extract abiotic data
## ============================================================================##
Padova_data <- extract(current_climate_crop, occs_Padova_sf, sp = T, df = T) %>%
as.data.frame()
Pescara_data <- extract(current_climate_crop, occs_Pescara_sf, sp = T, df = T) %>%
as.data.frame()
## ============================================================================##
## ============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
## ============================================================================##
Padova_data_mean <- Padova_data %>% group_by(Taxon) %>% summarise(bio1_mean = mean(bio1),
bio2_mean = mean(bio2))
Pescara_data_mean <- Pescara_data %>% group_by(Taxon) %>% summarise(bio1_mean = mean(bio1),
bio2_mean = mean(bio2))
## ============================================================================##
## ============================================================================##
## We make sure that we keep ONLY those species that are both present in Padova
## and Pescara
## ============================================================================##
Padova_subset <- setDT(Padova_data_mean)[Taxon %chin% Pescara_data_mean$Taxon]
Pescara_subset <- setDT(Pescara_data_mean)[Taxon %chin% Padova_data_mean$Taxon]
Padova_subset$Taxon <- factor(Padova_subset$Taxon)
Pescara_subset$Taxon <- factor(Pescara_subset$Taxon)
## ============================================================================##
## ============================================================================##
## Check if they are identical
## ============================================================================##
identical(sort(unique(Padova_subset$Taxon)), sort(unique(Pescara_subset$Taxon)))## [1] TRUE
## ============================================================================##
## Convert them to tibbles
## ============================================================================##
Padova_subset_tbl <- Padova_subset %>% arrange(Taxon) %>% as_tibble() %>% mutate(Province = "Padova")
Pescara_subset_tbl <- Pescara_subset %>% arrange(Taxon) %>% as_tibble() %>% mutate(Province = "Pescara")
Padova_Pescara <- full_join(Padova_subset_tbl, Pescara_subset_tbl)
Padova_Pescara$Province <- factor(Padova_Pescara$Province)
## ============================================================================##
## ============================================================================##
## Finally, we conduct the ANOVA
## ============================================================================##
summary(aov(bio1_mean ~ Province, data = Padova_Pescara))## Df Sum Sq Mean Sq F value Pr(>F)
## Province 1 29.8 29.848 26.51 3.54e-07 ***
## Residuals 610 686.8 1.126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness
## Df Sum Sq Mean Sq F value Pr(>F)
## Province 1 4023 4023 303.6 <2e-16 ***
## Residuals 610 8084 13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness
Did we find any significant differences?
5 Appendix: R code
##===========================================================================##
## Do NOT run this code
##===========================================================================##
library(knitr)
library(rmdformats)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
# comment=NA,
message=FALSE,
warning=FALSE,
eval = TRUE)
opts_knit$set(width=75)
##===========================================================================##
pre {
max-height: 400px;
overflow-y: auto;
}
pre[class] {
max-height: 400px;
}
##===========================================================================##
## Load them
##===========================================================================##
library(tidyverse)
library(rgbif)
library(raster)
library(usdm)
library(rgeos)
library(rgdal)
library(rinat)
library(wdpar)
library(countrycode)
library(devtools)
library(sf)
library(dismo)
library(CoordinateCleaner)
library(scales)
library(scrubr)
library(mapr)
library(rasterVis)
library(data.table)
##===========================================================================##
##===========================================================================##
## Download the data for Greece
##===========================================================================##
## There are three levels available, you are free to explore them
Greece <- raster::getData('GADM', country = 'GRC', level = 3)
##===========================================================================##
##===========================================================================##
## Plot Greece
##===========================================================================##
plot(Greece)
##===========================================================================##
## First we need to see what is inside the object we created
##===========================================================================##
Greece
names(Greece)
Crete <- subset(Greece, NAME_1 == "Crete")
##===========================================================================##
## Plot Crete
##===========================================================================##
plot(Crete)
##===========================================================================##
## Load the Italian provinces
##===========================================================================##
Italy <- read_sf('Italin provinces.shp') %>%
st_transform(4326)
##===========================================================================##
## Download protected area data for Greece (it might take a while)
##===========================================================================##
gr_raw_pa_data <- wdpa_fetch("Greece", wait = TRUE)
##===========================================================================##
##===========================================================================##
## Let's see what we have downloaded
##===========================================================================##
# head(gr_raw_pa_data)
##===========================================================================##
## Reproject data to longitude/latitude for plotting
##===========================================================================##
gr_pa_data <- st_transform(gr_raw_pa_data, 4326)
htmltools::img(src = knitr::image_uri(file.path("G:/Academic/Lessons/EMB/Labs/Labs/TRUE RMD Files/HBS Seminar/Logo_HBS/ebe-LOGO-005-300_1_70.jpg")),
alt = 'logo',
style = 'position:absolute; top:0; right:0; padding:5px;')
##===========================================================================##
## Recalculate extent
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
mutate(AREA_KM2 = as.numeric(st_area(.)) * 1e-6) ## What is this number?
##===========================================================================##
##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
as.data.frame() %>%
dplyr::select(-geometry) %>%
group_by(IUCN_CAT) %>%
summarize(area_km = sum(AREA_KM2)) %>%
ungroup() %>%
mutate(percentage = (area_km / sum(area_km)) * 100) %>%
arrange(desc(area_km))
##===========================================================================##
##===========================================================================##
## Print it
##===========================================================================##
statistic
##===========================================================================##
## Keep certain PAs only
##===========================================================================##
gr_pa_data_filtered <- gr_pa_data %>% filter(!str_detect(IUCN_CAT, 'Not'))
##===========================================================================##
##===========================================================================##
## First convert Greece and Crete to sf objects
##===========================================================================##
Greece_sf <- st_as_sf(Greece)
Crete_sf <- st_as_sf(Crete)
##===========================================================================##
##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Greece_sf) +
geom_sf(fill = "antiquewhite1") +
geom_sf() +
geom_sf(data = Greece_sf, fill = NA) +
geom_sf(aes(fill = IUCN_CAT), data = gr_pa_data_filtered, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete <- st_intersection(gr_pa_data_filtered, Crete_sf)
##===========================================================================##
##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
geom_sf(fill = "antiquewhite1") +
geom_sf() +
geom_sf(data = Crete_sf, fill = NA) +
geom_sf(aes(fill = IUCN_CAT), data = pa_crete, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete_all <- st_intersection(gr_pa_data, Crete_sf)
##===========================================================================##
##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
geom_sf(fill = "antiquewhite1") +
geom_sf() +
geom_sf(data = Crete_sf, fill = NA) +
geom_sf(aes(fill = IUCN_CAT), data = pa_crete_all, inherit.aes = FALSE) +
theme(axis.title = element_blank(), legend.position = "bottom")
data_it <- readRDS('Italian data cleaned.rds')
##============================================================================##
## First load the excel file with the correct species names
##============================================================================##
nms <- readxl::read_excel('italian_names.xlsx')
##============================================================================##
##============================================================================##
## Keep only one instance per taxon from our occurrence data
##============================================================================##
nms_flags <- data_it %>%
distinct(scientificName) %>%
mutate(Taxon = nms$scientificName,
action = nms$action)
##============================================================================##
##============================================================================##
## Then filter out any incorrect taxa
##============================================================================##
data_it <- data_it %>%
full_join(nms_flags) %>%
as_tibble() %>%
filter(!str_detect(action, 'DEL'))
##============================================================================##
##============================================================================##
## Then convert the occurrence data from GBIF to a spatial object
##============================================================================##
data_it_sf <- st_as_sf(data_it,
coords = c('decimalLongitude',
'decimalLatitude'),
crs = 4326)
##============================================================================##
##============================================================================##
## Then create an object representing Pescara and another one representing
## Padova
##============================================================================##
Pescara <- Italy %>% filter(NOME_PRO == 'PESCARA')
Padova <- Italy %>% filter(NOME_PRO == 'PADOVA')
##============================================================================##
##============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
##============================================================================##
occs_Pescara <- data_it_sf[Pescara, ]
occs_Padova <- data_it_sf[Padova, ]
##============================================================================##
##============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
##============================================================================##
occs_not_in_Padova <- data_it_sf[Padova, , op = st_disjoint]
##============================================================================##
##============================================================================##
## First we plot Italy
##============================================================================##
plot(st_geometry(Italy))
##============================================================================##
##============================================================================##
## Then we plot Padova, after we convert the polygon into a polyline with st_cast
##============================================================================##
plot(st_geometry(st_cast(Padova, 'MULTILINESTRING')), add = T, col = 'red')
##============================================================================##
##============================================================================##
## Then we plot the occurrences present in Padova
##============================================================================##
plot(st_geometry(occs_Padova %>% dplyr::sample_n(100)), add = T, col = 'blue')
##============================================================================##
##============================================================================##
## And finally we plot the occurrences not present in Padova
##============================================================================##
plot(st_geometry(occs_not_in_Padova %>% dplyr::sample_n(1000)), add = T, col = 'darkgreen')
aggr_data <- readRDS('Italian species data.rds')
##============================================================================##
## First we will create a new spatial object containing both the point and the
## polygon data
##============================================================================##
aggr_data <- Italy %>%
st_join(data_it_sf) %>%
dplyr::group_by(NOME_PRO) %>% ## Here we group the data based on the provincial names
## First we create a variable containing the sum of the species in each province
dplyr::summarize(Species_Number = sum(NROW(unique(Taxon))),
## Then we just sum the number of occurrences in each province
Occurrence_Number = n())
##============================================================================##
##============================================================================##
## Save it to disk
##============================================================================##
saveRDS(aggr_data, 'Italian species data.rds')
##============================================================================##
## Let's first plot the occurence number in each Italian province
##============================================================================##
plot(aggr_data['Occurrence_Number'])
##============================================================================##
## And then plot the species number in each Italian province
##============================================================================##
plot(aggr_data['Species_Number'])
##============================================================================##
## First, let's download current climate data for the whole planet
##============================================================================##
current_climate <- raster::getData('worldclim', var = 'bio', res = 2.5)
##============================================================================##
##============================================================================##
## Then, let's download future climate data
##============================================================================##
future_climate <- raster::getData('CMIP5', var = 'bio', res = 2.5,
rcp = 85, model = 'CC', year = 70)
##============================================================================##
names(current_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5',
'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10',
'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15',
'bio_16', 'bio_17', 'bio_18', 'bio_19')
names(future_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5',
'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10',
'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15',
'bio_16', 'bio_17', 'bio_18', 'bio_19')
gain(current_climate$bio_1) = 0.1
gain(future_climate$bio_1) = 0.1
altitude <- raster::getData('alt', country = 'ITA', mask = TRUE)
current_climate_crop <- readRDS('Italy WC.rds')
altitude <- raster('ITA_msk_alt.grd')
##============================================================================##
## First for Italy
##============================================================================##
current_climate_crop <- crop(current_climate,
Italy,
snap = 'in') %>%
mask(., Italy)
altitude <- crop(altitude,
Italy,
snap = 'in') %>%
mask(., Italy) %>%
## We are forcing the extent of altitude to be exactly the same as that of the
## climate data
resample(., current_climate_crop, 'bilinear')
##============================================================================##
##============================================================================##
## Merge (stack) climate and altitude
##============================================================================##
current_climate_crop <- stack(current_climate_crop, altitude)
##============================================================================##
##============================================================================##
## Then for Padova
##============================================================================##
current_climate_Padova <- crop(current_climate_crop,
Padova,
snap = 'in') %>%
mask(., Padova)
##============================================================================##
plot(current_climate_crop[[1]])
title('Mean annual temperature in Italy')
##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>%
st_set_precision(1000) %>%
st_make_valid() %>%
st_set_precision(1000) %>%
st_combine() %>%
st_union() %>%
st_set_precision(1000) %>%
st_make_valid() %>%
st_transform(st_crs(gr_pa_data)) %>%
st_make_valid()
##===========================================================================##
##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
filter(MARINE == "terrestrial") %>%
st_intersection(gr_boundary_data) %>%
rbind(gr_pa_data %>%
filter(MARINE == "marine") %>%
st_difference(gr_boundary_data)) %>%
rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
"marine")))
##===========================================================================##
##============================================================================##
## First, some main summary statistics
##============================================================================##
cellStats(current_climate_crop, 'summary')
##============================================================================##
## Then some density plots for some variables
##============================================================================##
density(current_climate_crop[[1:2]])
##============================================================================##
## And some boxplots for the same variables
##============================================================================##
boxplot(current_climate_crop[[1:2]])
##============================================================================##
## We can also calculate the correlation of those variables
##============================================================================##
pairs(current_climate_crop[[1:2]])
vifcor(current_climate_crop %>% as.data.frame(), th = 0.7)
##============================================================================##
## Then we convert the sp objects to sf objects
##============================================================================##
occs_Padova_sf <- occs_Padova %>% st_as_sf()
occs_Pescara_sf <- occs_Pescara %>% st_as_sf()
##============================================================================##
##============================================================================##
## Extract abiotic data
##============================================================================##
Padova_data <- extract(current_climate_crop, occs_Padova_sf, sp = T, df = T) %>%
as.data.frame()
Pescara_data <- extract(current_climate_crop, occs_Pescara_sf, sp = T, df = T) %>%
as.data.frame()
##============================================================================##
##============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
##============================================================================##
Padova_data_mean <- Padova_data %>%
group_by(Taxon) %>%
summarise(bio1_mean = mean(bio1),
bio2_mean = mean(bio2))
Pescara_data_mean <- Pescara_data %>%
group_by(Taxon) %>%
summarise(bio1_mean = mean(bio1),
bio2_mean = mean(bio2))
##============================================================================##
##============================================================================##
## We make sure that we keep ONLY those species that are both present in Padova
## and Pescara
##============================================================================##
Padova_subset <- setDT(Padova_data_mean)[Taxon %chin% Pescara_data_mean$Taxon]
Pescara_subset <- setDT(Pescara_data_mean)[Taxon %chin% Padova_data_mean$Taxon]
Padova_subset$Taxon <- factor(Padova_subset$Taxon)
Pescara_subset$Taxon <- factor(Pescara_subset$Taxon)
##============================================================================##
##============================================================================##
## Check if they are identical
##============================================================================##
identical(sort(unique(Padova_subset$Taxon)),
sort(unique(Pescara_subset$Taxon)))
##============================================================================##
## Convert them to tibbles
##============================================================================##
Padova_subset_tbl <- Padova_subset %>%
arrange(Taxon) %>%
as_tibble() %>%
mutate(Province = 'Padova')
Pescara_subset_tbl <- Pescara_subset %>%
arrange(Taxon) %>%
as_tibble() %>%
mutate(Province = 'Pescara')
Padova_Pescara <- full_join(Padova_subset_tbl, Pescara_subset_tbl)
Padova_Pescara$Province <- factor(Padova_Pescara$Province)
##============================================================================##
##============================================================================##
## Finally, we conduct the ANOVA
##============================================================================##
summary(aov(bio1_mean ~ Province, data = Padova_Pescara))
summary(aov(bio2_mean ~ Province, data = Padova_Pescara))
##===========================================================================##
## Do NOT run this code
##===========================================================================##
labs = knitr::all_labels(echo = TRUE)
##===========================================================================##References
Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., & Pebesma, E.J. (2008) Applied spatial data analysis with r. Springer,
Lovelace, R., Nowosad, J., & Muenchow, J. (2019) Geocomputation with r. CRC Press,
McSweeney, C.F., Jones, R.G., Lee, R.W., & Rowell, D.P. (2015) Selecting CMIP5 GCMs for downscaling over multiple regions. Clim. Dyn., 44, 3237–3260.
There are two ways to do that fast:
1.library(pacman); p_load('package_name_1', 'package_name_2')
2.install.packages(c('package_name_1', 'package_name_2'), dependecies = TRUE)
Remember that if you use the first way, you need to install packagepacmanfirst.↩If you want to deepen your understanding on this, please click here↩
You are strongly advised to have a look to both these books↩
This step has already been conducted for you prior to this practical.↩
We can also use the
disaggregatefunction if we want to have a finer resolution, but this is strongly discouraged↩